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ABSTRACT 

In this paper, we present a novel error measure to compare 
a segmentation against ground truth. This measure, which we 
call Tolerant Edit Distance (TED), is motivated by two ob¬ 
servations: (1) Some errors, like small boundary shifts, are 
tolerable in practice. Which errors are tolerable is applica¬ 
tion dependent and should be a parameter of the measure. 
(2) Non-tolerable errors have to be corrected manually. The 
time needed to do so should be reflected by the error measure. 
Using integer linear programming, the TED flnds the minimal 
weighted sum of split and merge errors exceeding a given tol¬ 
erance criterion, and thus provides a time-to-flx estimate. In 
contrast to commonly used measures like Rand index or varia¬ 
tion of information, the TED (1) does not count small, but tol¬ 
erable, differences, (2) provides intuitive numbers, (3) gives a 
time-to-flx estimate, and (4) can localize and classify the type 
of errors. By supporting both isotropic and anisotropic vol¬ 
umes and having a flexible tolerance criterion, the TED can 
be adapted to different requirements. On example segmen¬ 
tations for 3D neuron segmentation, we demonstrate that the 
TED is capable of counting topological errors, while ignoring 
small boundary shifts. 


1. INTRODUCTION 

In the computer vision literature, several approaches to assess 
the quality of contour detection and segmentation algorithms 
can be found. Most of these measures have been designed 
to capture the intuition of what humans consider to be two 
similar results. In particular, these measures are supposed to 
be robust to certain tolerated deviations, like small shifts of 
contours. Eor the contour detection in the Berkeley segmen¬ 
tation dataset [14], for example, the precision and recall of 
detected boundary pixels within a threshold distance to the 
ground truth became the widely used standard [13, 1]. Con¬ 
tour error measures are, however, not a good fit for segmen¬ 
tations, since small errors in the detection of a contour can 
lead to the split or merge of segments. Therefore, alternatives 
like the Variation Of Information (VOI), the Rand Index [18] 
(RI), the probabilistic Rand index [20, 21], and the segmenta¬ 
tion covering measure [1], have been proposed. 




(c) tolerable relabellings of y 



(d) closest relabeling to x 


Fig. 1: Illustration of the Tolerant Edit Distance (TED) be¬ 
tween two segmentations x and y. By tolerating boundary 
shifts to a certain extend, shown as shadow in (b), y is al¬ 
lowed to be changed to match x as closely as possible. Eor 
that, we consider regions obtained by combining the x and y, 
illustrated in (c). Eor each of these regions, we enumerate a 
set of labels used by y that are within a threshold distance to 
all locations inside the region (shown in curly brackets). This 
threshold is the maximally allowed boundary shift. Note that 
in this example, the region obtained from intersecting A and 3 
can change its label to 1 (or keep 3), but not to 2, since it con¬ 
tains points that are too far way from region 2. Regions with 
only one possible label are too large to be relabeled by shifting 
their boundary and have to keep their initial label. Erom all 
the possible ways to relabel y, the relabeling (d) minimizing 
the number of split and merge errors compared to x is chosen 
by solving an integer linear program. 


However, these measures do not acknowledge that there 
are different criteria for segmentation comparison, and in¬ 
stead accumulate errors uniformly, even for many small dif¬ 
ferences that are irrelevant in practice. Especially in the held 
of biomedical image processing, we are often more interested 
in counting true topological errors like splits and merges of 
objects, instead of counting small deviations from the ground 
truth contours. This is in particular the case for imaging meth- 


























ods for which no unique “ground truth” labeling exists. In the 
imaging of neural tissue with Electron Microscopy (EM), for 
example, the preparation protocol can alter the volume of neu¬ 
ral processes, such that it is hard to know what the true size 
was [19]. Eurther, the imaging resolution and data quality 
might just not be sufficient to clearly locate contours between 
objects [3], resulting in a high inter-observer variability. 

To address these issues, we present a novel measure to 
evaluate segmentations on a clearly specified tolerance cri¬ 
terion: At the core of our measure, that we call Tolerant 
Edit Distance (TED)\ is an explicit tolerance criterion {e.g., 
boundary shifts within a certain range). Using integer linear 
programming, we find the minimal weighted sum of split 
and merge errors exceeding the tolerance criterion, and thus 
provide a time-to-fix estimate. By interpreting a segmenta¬ 
tion as a general labeling of voxels, our measure does not 
require voxels of the same object to form a connected com¬ 
ponent, and thus supports anisotropic volumes, missing data, 
or known object connections via paths outside the volume 
being considered. The reported results are intuitive, easy 
to interpret, and errors can be localized in the volume. An 
illustration of the TED can be found in Eigure 1 . 

Application to Neuron Segmentation. To demonstrate the 
usefulness of our measure, we present our results in the con¬ 
text of automatic neuron segmentation from EM volumes, an 
active field of biomedical image processing (for recent ad¬ 
vances, see [5, 10, 15, 16, 8]). In this field, the criterion to 
assess the quality of a segmentation depends on the biologi¬ 
cal question: On one hand, skeletons of neurons are sufficient 
to identify individual neurons [17], to study neuron types and 
their function [4], and to obtain the wiring diagram of a ner¬ 
vous system (the so-called connectome) [3]. In these cases, 
topological correctness is far more important than the diam¬ 
eter of a neural process or the exact location of its boundary 
(see Eigure 2 for examples). On the other hand, for biophys- 
ically realistic neuron simulation, volumetric information is 
needed to model action potential time dynamics, and to un¬ 
derstand and simulate information processing capabilities of 
single neurons [12]. In this case, the segmentation should be 
close to the true volume of the reconstructed neurons. Only 
small deviations in the boundary location might still be toler¬ 
able. 

Current state-of-the-art methods for automatic neuron 
segmentation can broadly be divided into isotropic [15, 11, 
16, 8] and anisotropic methods [5, 10, 6]. Eor both types, re¬ 
porting segmentation accuracy in terms of VOI or RI became 
the de-facto standard [15, 11, 10, 16, 8]. Less frequently 
used [5, 6] is tho Anisotropic Edit Distance (AED) [5] and the 
Warping Error (WE) [9]. The AED is tailored to the specific 
error correction steps required for anisotropic volumes (splits 
and merges of 2D neuron slices within a section, connections 
and disconnections of slices between sections). The WE aims 


^Source code available at http : //github . com/funkey/ted. 


proposal segmentation y 

Fig. 2: Example errors made by an automatic neuron seg¬ 
mentation algorithm. Errors like merges (M) and splits (S) 
dramatically change the reconstructed topology and should 
be avoided. Small disagreements in the boundary location 
(T) are however tolerable and should be ignored during eval¬ 
uation. 

to measure the difference between ground truth and a pro¬ 
posal segmentation in terms of their topological differences. 
As such, the WE was the first error measure for neuron 
segmentation that deals with the delicate question of up to 
which point a boundary shift is not considered to be an error. 
However, since the WE assumes a foreground-background 
segmentation where connected foreground objects repre¬ 
sent neurons, it is only applicable to isotropic volumes (in 
anisotropic volumes, connectedness of neurons is not always 
preserved). Eurthermore, only suboptimal solutions to the 
WE are found using a greedy, randomized heuristic, which 
makes it difficult to use for evaluation purposes. Conse¬ 
quently, the WE has found its main application in the training 
of neural networks for image classification [9] . 

2. TOLERANT EDIT DISTANCE 

The TED measures the difference between two segmentations 
X \ Et ^ Kx and y : Q Ky, where U is a discrete set of 
voxel (or supervoxel) locations in a volume, and Kx and Ky 
are sets of labels used by x and y, respectively. The differ¬ 
ence is reported in terms of the minimal number of splits and 
merges appearing in a relabeling of as compared with x. 
How y is allowed to be relabeled is defined on a tolerance cri¬ 
terion, e.g., the maximal displacement of an object boundary. 

We say that a label k G Kx overlaps with a label I G Ky, 
if there exists at least one location i G U such that x{i) = k 
and y{i) = 1. If X and y represent the same segmentation, 
each label I overlaps with exactly one label k, and vice versa. 
Consequently, if a label k G Kx overlaps with n labels from 
Ky, we count it as n — 1 splits. Analogously, if a label I G Ky 
overlaps with n labels from Kx, we count it as n — 1 merges. 
Eor two labelings x and y, we denote as s(x, y) and m(x, ^) 
the sum of splits and merges over all labels. 

Let a tolerance function T be a binary indicator on two 
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labeling functions y and y', 


Tiy,y') 


1 if ^ is similar to y', 
0 otherwise. 


( 1 ) 


Further, let y be the set of all labeling functions y' : ft Ky, 
i.e., all possible labelings of ft using the labels of y, and let 
y~^{y) = ^ 3^ I T{y, y') = 1} be the set of all tolerated 

relabelings of y. The TED is the minimal weighted sum of 
splits and merges over all tolerable relabelings y^{y)\ 

TED(x, y) = min a s(x, y') + [3 m{x, y'), (2) 

y'ey+(y) 

where the weights a and (3 represent the time or effort needed 
to fix a split or merge, respectively. 

In order to find the minimum of (2), we assume that the 
tolerance function is local, i.e., there exists a set Ai of toler¬ 
able labels for each location i, and a tolerable labeling is any 
combination of those labels: 


T{y,y') = P[l(2/'(i) e Ai). 

An example of such a tolerance function is shown in Fig¬ 
ure 1 (c). With this assumption, we solve (2) with the fol¬ 
lowing integer linear program (ILP): 
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At the core of this ILP are binary indicator variables v = 
{0,1} I i G ft^ I G A^)to indicate the assignment 
of label I to location i. Constraints (4) and (5) ensure that 
exactly one of the labels gets chosen for each location and 
that each label of y has to appear at least once. Further, we 
introduce binary variables a^i that indicate the presence of a 
joint assignment of label k from x and label I from y' at at 
least one location. With constraints (6) and (7) we make sure 
that each Oki = 1 if and only if there is at least one location 


i G such that x{i) = k and y'{i) = 1. To count the number 
of times a label k G is split in y', we further introduce 
integers Sk G N. These counts equal the number of times k 
was matched with any other label minus one, which we ensure 
with constraints (8). Analogously, we introduce integers mi 
and constraints (9) for merges caused by label / in The 
final split and merge numbers s and m are just the sums of 
the label-wise splits and merges, ensured by (10) and (11). 

Once the optimal solution of this ILP has been found, the 
variables Oki can be used to determine which labels got split 
and merged, and thus to localize errors. 

3. RESULTS 

Shift of Object Boundary. To illustrate the behaviour of dif¬ 
ferent error measures in the case of object boundary displace¬ 
ments, we created a simple artificial ID labeling consisting of 
two regions. We show the errors of segmentations obtained 
by shifting the boundary between the objects. It can clearly 
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be seen that TED assigns the same numbers (one split and one 
merge error) as soon as a given tolerance criterion is exceeded 
(0.025 in this example), regardless where the error happens. 
This is the desired outcome for applications like neuron seg¬ 
mentation, where it is important to count the number of topo¬ 
logical errors regardless of how many voxels got affected. 

Influence of Distance Threshold. In order to study the effect 
of the threshold distance for boundary shifts, we used an au¬ 
tomatic segmentation result^ and evaluated the TED for vary¬ 
ing thresholds. The TED reveals that most of the errors occur 
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within the range of about 50nm, corresponding to about 12 
^Obtained using Sopnet [5] on a publicly available EM dataset [ ] 










































(a) ground truth x (b) proposal segmentation y (c) closest tolerable relabeling of y to a; 


Fig. 3: Errors found by the TED between a human generated ground truth x (a) and a proposal segmentation y (b), illustrated on 
two neurons (purple and red in ground truth). Small errors, as the one shown in the magnification, are tolerated and consequently 
removed in the tolerable relabelling of y (c). Remaining errors are considered real splits (S) and merges (M). 


pixels in the x-y-plane of this dataset. Depending on the bi¬ 
ological need, those errors might be tolerable. In the same 
plot, we show the VOI of the closest tolerable relabeling to 
the ground truth under the given boundary shift threshold {i.e ., 
the equivalent of Eigure 1 (d) on the proposal segmentation). 
Erom this example, we can see that the errors < 50nm con¬ 
tribute quite significantly with 0.23 bits to the total VOI of 
0.886, and thus can shadow true topological errors. 


Comparison to RI and VOI. We compare RI and VOI 
against TED for three manual modifications of the ground 
truth labeling of [7]. Eor the lOnm shift experiment, we 
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shifted the boundaries of neurons in the ground truth by 
lOnm. Eor the splits and merges experiment, we split and 
merged neurons at 10 randomly selected locations, respec¬ 
tively. It can be seen that the small shifts of object boundaries 
can have a significant contribution to the measures RI and 
VOI, which confirms our previous observation. 


Localization of Errors. Due to the explicit tolerance crite¬ 
rion of the TED, errors can be localized in the volume. In 
Eigure 3 we show example split an merge errors detected by 
the TED on an automatic segmentation result for the SNEMI 
dataset [2]. The boundary shift tolerance was set to lOOnm, 
which corresponds to 16.6x16.6x3.3 voxels for this volume 
with a resolution of 6nm x 6nm x 30nm. 


4. CONCLUSIONS 

We presented the TED, a novel measure for segmentation 
comparison, which tolerates small errors based on an explicit 


tolerance criterion. 

Although we demonstrated the TED in the domain of neu¬ 
ron segmentation, our error measure is not intrinsically lim¬ 
ited to this application. In our future work, we will investigate 
its use for other computer vision problems, and especially on 
the training of algorithms to minimize this error measure. 

A current limitation of the TED is the restriction to use 
local tolerance functions. Although more involved tolerance 
criteria could in theory be incorporated into the ILP by adding 
auxiliary variables, it remains questionable whether the re¬ 
sulting problem is still tractable. Although we did not ob¬ 
serve that empirically, even with the current formulation it is 
conceivable that an optimal solution to the ILP can not be 
found in reasonable time. This could in particular be the case 
if ground truth and proposal segmentation differ a lot and a 
very lax tolerance criterion is used. In these cases, approxi¬ 
mate solutions to the proposed ILP might be considered. 
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